52 research outputs found
Scaling Multidimensional Inference for Big Structured Data
In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications [151]. In a
world of increasing sensor modalities, cheaper storage, and more data oriented questions, we are quickly passing the limits of tractable computations using traditional statistical analysis
methods. Methods which often show great results on simple data have difficulties processing complicated multidimensional data. Accuracy alone can no longer justify unwarranted memory
use and computational complexity. Improving the scaling properties of these methods for multidimensional data is the only way to make these methods relevant. In this work we explore methods for improving the scaling properties of parametric and nonparametric
models. Namely, we focus on the structure of the data to lower the complexity of a specific family of problems. The two types of structures considered in this work are distributive
optimization with separable constraints (Chapters 2-3), and scaling Gaussian processes for multidimensional lattice input (Chapters 4-5). By improving the scaling of these methods, we can expand their use to a wide range of applications which were previously intractable
open the door to new research questions
Chance-Constrained Outage Scheduling using a Machine Learning Proxy
Outage scheduling aims at defining, over a horizon of several months to
years, when different components needing maintenance should be taken out of
operation. Its objective is to minimize operation-cost expectation while
satisfying reliability-related constraints. We propose a distributed
scenario-based chance-constrained optimization formulation for this problem. To
tackle tractability issues arising in large networks, we use machine learning
to build a proxy for predicting outcomes of power system operation processes in
this context. On the IEEE-RTS79 and IEEE-RTS96 networks, our solution obtains
cheaper and more reliable plans than other candidates
Recommended from our members
Scaling Multidimensional Inference for Structured Gaussian Processes
Exact Gaussian process (GP) regression has O(N-3) runtime for data size N, making it intractable for large N. Many algorithms for improving GP scaling approximate the covariance with lower rank matrices. Other work has exploited structure inherent in particular covariance functions, including GPs with implied Markov structure, and inputs on a lattice (both enable O(N) or O(N log N) runtime). However, these GP advances have not been well extended to the multidimensional input setting, despite the preponderance of multidimensional applications. This paper introduces and tests three novel extensions of structured GPs to multidimensional inputs, for models with additive and multiplicative kernels. First we present a new method for inference in additive GPs, showing a novel connection between the classic backfitting method and the Bayesian framework. We extend this model using two advances: a variant of projection pursuit regression, and a Laplace approximation for non-Gaussian observations. Lastly, for multiplicative kernel structure, we present a novel method for GPs with inputs on a multidimensional grid. We illustrate the power of these three advances on several data sets, achieving performance equal to or very close to the naive GP at orders of magnitude less cost
- …